Task-oriented dialogue (TOD) systems are mainly based on the slot-filling-based TOD (SF-TOD) framework, in which dialogues are broken down into smaller, controllable units (i.e., slots) to fulfill a specific task. A series of approaches based on this framework achieved remarkable success on various TOD benchmarks. However, we argue that the current TOD benchmarks are limited to surrogate real-world scenarios and that the current TOD models are still a long way from unraveling the scenarios. In this position paper, we first identify current status and limitations of SF-TOD systems. After that, we explore the WebTOD framework, the alternative direction for building a scalable TOD system when a web/mobile interface is available. In WebTOD, the dialogue system learns how to understand the web/mobile interface that the human agent interacts with, powered by a large-scale language model.
translated by 谷歌翻译
The standard empirical risk minimization (ERM) can underperform on certain minority groups (i.e., waterbirds in lands or landbirds in water) due to the spurious correlation between the input and its label. Several studies have improved the worst-group accuracy by focusing on the high-loss samples. The hypothesis behind this is that such high-loss samples are \textit{spurious-cue-free} (SCF) samples. However, these approaches can be problematic since the high-loss samples may also be samples with noisy labels in the real-world scenarios. To resolve this issue, we utilize the predictive uncertainty of a model to improve the worst-group accuracy under noisy labels. To motivate this, we theoretically show that the high-uncertainty samples are the SCF samples in the binary classification problem. This theoretical result implies that the predictive uncertainty is an adequate indicator to identify SCF samples in a noisy label setting. Motivated from this, we propose a novel ENtropy based Debiasing (END) framework that prevents models from learning the spurious cues while being robust to the noisy labels. In the END framework, we first train the \textit{identification model} to obtain the SCF samples from a training set using its predictive uncertainty. Then, another model is trained on the dataset augmented with an oversampled SCF set. The experimental results show that our END framework outperforms other strong baselines on several real-world benchmarks that consider both the noisy labels and the spurious-cues.
translated by 谷歌翻译
来自磁共振成像(MRI)的体积图像在直肠癌的术前分期提供了宝贵的信息。最重要的是,T2和T3阶段之间的准确术前歧视可以说是直肠癌治疗的最具挑战性和临床意义的任务,因为通常建议对T3(或更大)阶段癌症患者进行化学疗法。在这项研究中,我们提出了一个体积卷积神经网络,可准确区分T2与直肠MR体积的T3阶段直肠癌。具体而言,我们提出1)基于自定义的基于重新连接的卷编码器,该编码器与晚期融合的固定间关系建模(即最后一层的3D卷积),2)双线性计算,该计算汇总了编码器所得的功能以创建一个创建一个的功能体积特征和3)三重损失和焦点损失的关节最小化。通过病理确认的T2/T3直肠癌的MR量,我们进行了广泛的实验,以比较残留学习框架内的各种设计。结果,我们的网络达到了0.831的AUC,高于专业放射科医生组的准确性。我们认为该方法可以扩展到其他卷分析任务
translated by 谷歌翻译
了解产品内容的视觉和语言表示对于电子商务中的搜索和推荐应用程序至关重要。作为在线购物平台的骨干,受到代表学习研究的最新成功的启发,我们提出了一个对比度学习框架,该框架使用未标记的原始产品文本和图像来对齐语言和视觉模型。我们介绍了我们用来培训大规模代表性学习模型的技术,并共享解决特定领域挑战的解决方案。我们使用预先训练的模型作为多种下游任务的骨干进行研究,包括类别分类,属性提取,产品匹配,产品聚类和成人产品识别。实验结果表明,我们所提出的方法在每个下游任务中均优于单个模态和多种方式的基线。
translated by 谷歌翻译
证据回归网络(ENET)估计连续的目标及其预测性不确定性,没有昂贵的贝叶斯模型平均。然而,由于eNET的原始损失功能的梯度收缩问题,因此可能是不准确的预测目标,负面日志边缘似然(NLL)丢失。在本文中,目的是通过解决梯度收缩问题来提高eNET的预测精度,同时保持其有效的不确定性估计。提出了一个多任务学习(MTL)框架,称为MT-ENET,以实现此目标。在MTL中,我们将LipsChitz修改的均方误差(MSE)丢失函数定义为另一个损耗并将其添加到现有的NLL损耗中。 Lipschitz修改后的MSE损失旨在通过动态调整其Lipschitz常数,减轻与NLL损耗的渐变冲突。通过这样做,Lipschitz MSE损失不会扰乱NLL损失的不确定性估计。 MT-ENET增强了eNET的预测精度,而不会在合成数据集和现实世界基准上丢失不确定性估计能力,包括药物 - 目标亲和力(DTA)回归。此外,MT-ENET在DTA基准测试中显示出显着的校准和分布外检测能力。
translated by 谷歌翻译
人类通常通过利用关于他们正在交谈的人的主题和背景信息的先验知识来进行对话。然而,现有的会话代理和数据集不考虑此类综合信息,因此它们有一个限制生成知识和人格正确融合的话语。为解决此问题,我们介绍了一个呼叫进行定制对话(焦点)数据集,其中包括用户的角色和维基百科知识建立了自定义答案。为了评估预先训练的语言模型的信息和定制话语的能力,我们利用BART和GPT-2以及基于变压器的模型。我们评估了他们的生成能力,自动分数并对人类评估进行定性结果。我们仔细检查模型是否反映了我们提出的两个子任务,人物接地(PG)和知识接地(KG)的充分人物和知识。此外,我们表明我们的数据的话语通过接地质量评估来构建具有正确的知识和角色。
translated by 谷歌翻译
我们介绍韩语了解评估(KLUE)基准。 Klue是8个韩国自然语言理解(nlu)任务的集合,包括主题分类,语言典的相似性,自然语言推断,命名实体识别,关系提取,依赖解析,机器阅读理解和对话状态跟踪。我们从各种源语料库中展开的所有任务,同时尊重版权,以确保任何没有任何限制的人的可访问性。考虑到道德考虑,我们仔细设计了注释协议。随着基准任务和数据,我们为每个任务提供适用的评估指标和微调配方,为每项任务进行预训练语言模型。我们还释放了预用的语言模型(PLM),Klue-Bert和Klue-Roberta,以帮助在KLUE上再现基线模型,从而促进未来的研究。我们通过拟议的Klue基准套件从初步实验中进行了一些有趣的观察,已经证明了这款新的基准套件的有用性。首先,我们找到了klue-roberta-mantring的其他基线,包括多语种plms和现有的开源韩国plms。其次,即使我们从预先预测语料库中取代个人身份信息,我们也会看到性能下降最小,这表明隐私和NLU能力并不彼此可能。最后,我们发现,使用BPE标记与语素级预象的组合,在涉及语素级标记,检测和发电的任务中是有效的。除了加速韩国人NLP研究外,我们的创建Klue的全面文件将有助于将来为其他语言创建类似的资源。 klue在https://klue-benchmark.com上提供。
translated by 谷歌翻译
精心设计的射频(RF)脉冲在许多系统(如移动电话,雷达和磁共振成像)中发挥着关键作用。然而,RF波形的设计通常是没有一般解决方案的逆问题。结果,基于人类专家的直觉开发了各种具有特定目的的设计方法。在这项工作中,我们提出了一种人工智能(AI) - 射频脉冲设计框架,DEEPRF,利用深增强学习的自学特征来产生新的RF脉冲。使用常用的四种RF脉冲来证明DEEPRF的有效性。 DEEPRF设计的脉冲成功地满足了设计标准,同时报告了降低的能量。分析证明脉冲利用新的磁化操作机制,暗示DEEPRF在发现超出人类直觉之外的看不见的设计尺寸时的潜力。这项工作可以为AI驱动的RF波形设计的新兴领域奠定基础。
translated by 谷歌翻译
The automated segmentation and tracking of macrophages during their migration are challenging tasks due to their dynamically changing shapes and motions. This paper proposes a new algorithm to achieve automatic cell tracking in time-lapse microscopy macrophage data. First, we design a segmentation method employing space-time filtering, local Otsu's thresholding, and the SUBSURF (subjective surface segmentation) method. Next, the partial trajectories for cells overlapping in the temporal direction are extracted in the segmented images. Finally, the extracted trajectories are linked by considering their direction of movement. The segmented images and the obtained trajectories from the proposed method are compared with those of the semi-automatic segmentation and manual tracking. The proposed tracking achieved 97.4% of accuracy for macrophage data under challenging situations, feeble fluorescent intensity, irregular shapes, and motion of macrophages. We expect that the automatically extracted trajectories of macrophages can provide pieces of evidence of how macrophages migrate depending on their polarization modes in the situation, such as during wound healing.
translated by 谷歌翻译
Understanding the informative structures of scenes is essential for low-level vision tasks. Unfortunately, it is difficult to obtain a concrete visual definition of the informative structures because influences of visual features are task-specific. In this paper, we propose a single general neural network architecture for extracting task-specific structure guidance for scenes. To do this, we first analyze traditional spectral clustering methods, which computes a set of eigenvectors to model a segmented graph forming small compact structures on image domains. We then unfold the traditional graph-partitioning problem into a learnable network, named \textit{Scene Structure Guidance Network (SSGNet)}, to represent the task-specific informative structures. The SSGNet yields a set of coefficients of eigenvectors that produces explicit feature representations of image structures. In addition, our SSGNet is light-weight ($\sim$ 55K parameters), and can be used as a plug-and-play module for off-the-shelf architectures. We optimize the SSGNet without any supervision by proposing two novel training losses that enforce task-specific scene structure generation during training. Our main contribution is to show that such a simple network can achieve state-of-the-art results for several low-level vision applications including joint upsampling and image denoising. We also demonstrate that our SSGNet generalizes well on unseen datasets, compared to existing methods which use structural embedding frameworks. Our source codes are available at https://github.com/jsshin98/SSGNet.
translated by 谷歌翻译